NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Enhancing interfacial thermal transport by grafting H-bonded polymer Chains: The role of chain morphology

https://doi.org/10.1016/j.apsusc.2025.163009

Islam, Md Mohaiminul; Liu, Ling (July 2025, Applied Surface Science)

Free, publicly-accessible full text available July 1, 2026
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning Attack

Huang, Tiansheng; Bhattacharya, Gautam; Joshi, Pratik; Kimball, Josh; Liu, Ling (July 2025, IEEE)

Safety aligned Large Language Models (LLMs) are vulnerable to harmful fine-tuning attacks (Qi et al., 2023)– a few harmful data mixed in the fine-tuning dataset can break the LLMs’s safety alignment. While several defenses have been proposed, our evaluation shows that existing defenses fail when some specific training hyper-parameters are chosen – a large learning rate or a large number of training epochs in the fine-tuning stage can easily invalidate the defense. To this end, we propose Antidote, a post-fine-tuning stage solution, which remains agnostic to the training hyper-parameters in the fine-tuning stage. Antidote relies on the philosophy that by removing the harmful parameters, the harmful model can be recovered from the harmful behaviors, regardless of how those harmful parameters are formed in the fine-tuning stage. With this philosophy, we introduce a one-shot pruning stage after harmful fine-tuning to remove the harmful weights that are responsible for the generation of harmful content. Despite its embarrassing simplicity, empirical results show that Antidote can reduce harmful score while maintaining accuracy on downstream tasks.
more » « less
Free, publicly-accessible full text available July 15, 2026
Booster: Tackling Harmful Fine-tuing for Large Language Models via Attenuating Harmful Perturbation,

Huang, Tiansheng; Hu, Sihao; Ilhan, Fatih; Tekin, Selim F; Liu, Ling (April 2025, Proceedings of International Conference on Learning Representations (ICLR 2025))

Harmful fine-tuning attack poses serious safety concerns for large language models’ fine-tuning-as-a-service. While existing defenses have been proposed to mitigate the issue, their performances are still far away from satisfactory, and the root cause of the problem has not been fully recovered. To this end, we in this paper show that harmful perturbation over the model weights could be a probable cause of alignment-broken. In order to attenuate the negative impact of harmful perturbation, we propose an alignment-stage solution, dubbed Booster. Technically, along with the original alignment loss, we append a loss regularizer in the alignment stage’s optimization. The regularizer ensures that the model’s harmful loss reduction after the simulated harmful perturbation is attenuated, thereby mitigating the subsequent fine-tuning risk. Empirical results show that Booster can effectively reduce the harmful score of the fine-tuned models while maintaining the performance of downstream tasks. Our code is available at https://github.com/git-disl/Booster.
more » « less
Free, publicly-accessible full text available April 24, 2026
Vaccine: Perturbation-aware Alignment for Large Language Models against Harmful Fine-tuning Attack

Huang, Tiansheng; Hu, Sihao; Liu, Ling (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

Full Text Available
Vaccine: Perturbation-aware alignment for large language model

Huang, Tiansheng; Hu, Sihao; Liu, Ling (December 2024, The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024))

The new paradigm of fine-tuning-as-a-service introduces a new attack surface for Large Language Models (LLMs): a few harmful data uploaded by users can easily trick the fine-tuning to produce an alignment-broken model. We conduct an empirical analysis and uncover a harmful embedding drift phenomenon, showing a probable cause of the alignment-broken effect. Inspired by our findings, we propose Vaccine, a perturbation-aware alignment technique to mitigate the security risk of users fine-tuning. The core idea of Vaccine is to produce invariant hidden embeddings by progressively adding crafted perturbation to them in the alignment phase. This enables the embeddings to withstand harmful perturbation from un-sanitized user data in the fine-tuning phase. Our results on open source mainstream LLMs (e.g., Llama2, Opt, Vicuna) demonstrate that Vaccine can boost the robustness of alignment against harmful prompts induced embedding drift while reserving reasoning ability towards benign prompts. Our code is available at https://github.com/git-disl/Vaccine.
more » « less
Full Text Available
Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack

Huang, Tiansheng; Hu, Sihao; Ilhan, Fatih; Tekin, Selim; Liu, Ling (December 2024, NeurIPS 2024)

can be jail-broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature, we show that the jail-break effect can be mitigated by separating two states in the fine-tuning stage to respectively optimize over the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the excess drift towards the switching iterates of the two states could be a probable reason for the instability. To remedy this issue, we propose Lazy(i) safety alignment (Lisa), which introduces a proximal term to constraint the drift of each state. Theoretically, the benefit of the proximal term is supported by the convergence analysis, wherein we show that a sufficient large proximal factor is necessary to guarantee Lisa’s convergence. Empirically, our results on four downstream fine-tuning tasks show that Lisa with a proximal term can significantly increase alignment performance while maintaining the LLM’s accuracy on the user tasks. Code is available at https://github.com/git-disl/Lisa.
more » « less
Full Text Available
Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack

Huang, Tiansheng; Hu, Sihao; Ilhan, Fatih; Tekin, Selim Furkan; Liu, Ling (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

Full Text Available
Personalized Privacy Protection Mask Against Unauthorized Facial Recognition.

Chow, Ka-Ho; Hu, Sihao; Huang, Tiansheng; Liu, Ling (September 2024, Proceedings of 2024 European Conference on Computer Vision (ECCV), Sept 29-Oct 4, 2024. https://github.com/git-disl/Chameleon)

Full Text Available
Robust Few-Shot Ensemble Learning with Focal Diversity-Based Pruning

Tekin, Selim Furkan; Ilhan, Fatih; Huang, Tiansheng; Hu, Sihao; Chow, Ka-Ho; Loper, Margaret; Liu, Ling (November 2024, The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024))

Full Text Available
A Deep Prediction Framework for Multi-Source Information via Heterogeneous GNN

Wu, Zhen; Zhou, Jingya; Zhang, Jinghui; Liu, Ling; Huang, Chizhou (July 2024, Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD2024) Pages 3460 - 3471)

Full Text Available

« Prev Next »

Search for: All records